Skip to content

Conversation

littledgg
Copy link
Contributor

No description provided.

Copy link

paddle-bot bot commented Aug 12, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Aug 12, 2025
Comment on lines +1485 to +1487
if self.cudagraph_capture_prefill:
self.capture_model_prefill()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个加在gpu_worker 里吧 和原本的 gpu_model_runner 是平级的

@@ -1007,6 +1087,165 @@ def initialize_attn_backend(self) -> None:

self.attn_backends.append(attn_backend)

def _dummy_run_prefill(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

整个dummy run都需要重写吗?是不是重写个_dummy_prefill_inputs_prefill就够了

decode_exists = self.exist_decode()
paddle.distributed.all_gather_object(only_prefill_batch_list, not decode_exists)
only_prefill_batch = all(only_prefill_batch_list)
self.fd_config.parallel_config.moe_phase.phase = "prefill" if only_prefill_batch else "decode"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么需要改 moe_phase.phase

Comment on lines +560 to +573
full_length = min(
num_tokens // batch_size,
self.parallel_config.max_model_len - max_dec_len,
)

# NOTE(wanglongzhi): When the full length is too large, DeepEP's buffer size will not be enough to cause the result to appear nan.
# TODO(wanglongzhi): Figure out the accurate buffer size of DeepEP.
if self.fd_config.parallel_config.enable_expert_parallel:
full_length = min(full_length, 32)

input_length = int(full_length * self.cache_config.kv_cache_ratio)
block_num = (
input_length + self.cache_config.block_size - 1
) // self.cache_config.block_size + self.cache_config.enc_dec_block_num
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这部分逻辑能确保 input_length 等于想要捕获的 num_tokens 吗

@@ -909,6 +967,28 @@ def initialize_forward_meta(self):
and not (prefill_exists if prefill_exists is not None else self.exist_prefill())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seq_lens_encoder 这个tensor 指针会变,把get block shape kernel 输出的另外几个tensor 也打印出来看下

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants